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Abstract: The goal of this research is to create a reference 
data model for educational and research institutes of Iranian 
Ministry of Sciences, Research, and Technology. After 
investigating existing technologies and considering the 
problem context, ontology was chosen as the data model 
format. In order to create the target ontology, an ontology 
construction methodology was designed and implemented. 
This methodology is created using design science research 
method and contains an architecture, a detailed workflow 
process, a guideline for performing leach step, and related 
softwares in an integrated web-based system. The designed 
system is implemented in PHP and is available as open 
source. The system is used as the main tool to construct the 
target ontology. The proposed methodology leverages the 
three main knowledge sources including textual 
documents, existing ontologies in the higher education 
domain, and reverse engineering of a relational database of 
an integrated university system. The resulted product of this 
methodology was evaluated based on the 
data requirements of the Ministry of Sciences, Research, and 
Technology, and its shortcomings were resolved. The 
novelty of this work is both on the generated product, that is, 
a localized reference data model, and an ontology 
construction methodology. 


Keywords: Ontology Development; Higher Education 
Ontology; Ontology Learning. 


1. Introduction 

This research was conducted according to the request of the 
Ministry of Science, Research and Technology (MSRT) of 
Iran for creating a reference ontology tailored to the 
educational and research domain in the higher education 
business. 

Ontology is “a formal explicit specification of a shared 
conceptualization” [1] and is explained in a machine- 
readable language. In Information Technology, ontology is 
considered an information artifact that models a specific 
domain knowledge [2] and consists of classes (the 
representations of the real-world concepts), hierarchical 
relations between classes, data properties (expressing class 
attributes), and object properties (non-hierarchical relations 
between classes). 

Ontologies can be constructed by using three types of 
knowledge resources: unstructured (such as text documents), 
semi-structured (such as HTML files) and structured (such 
as relational databases) resources [3]. MSRT required us to 
cover at least the following list of knowledge sources: 

1. Statistical concepts of science, research and technology 
that are mentioned in two main books published by 
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Kahani? 


MSRT. 

2. Data objects that are stored in an active higher education 

Enterprise Resource Planning (ERP) software in Iran; 

3. All existing ontologies in this domain; 

Ontology construction is an expensive and tedious task 
and must be done in a systematic way by applying a proper 
methodology. 

Previous researches on creating higher education 
ontology since 2010 are listed in Table 1 and for each 
research, its methodology, resources, tools and the product is 
identified. 


Table 1. The summary of previous work 


Research Methodology Resources Tools PEN 
Satyamurty, Unknown Unknown Protégé | - 
Murthy, & 

Raghava [4] 
Hadjar [5] Adopted from Some Protégé | - 
Enterprise universities 
Ontology [6] organization 
al charts and 
executives of 
Ahlia 
University 
Zemmouchi- | Adopted from Text Neon HERO 
Ghomari & Neon [8] documents 
Ghomari [7] and some 
web sites 
Ameen, Proposed 7 Unknown Protégé | - 
Khan, & methods 
Rani [9] without details 
Malik, Unknown Unknown - - 
Prakash, & 
Rizvi [10] 


As shown in Table 1, only two detailed methodologies 
have been used in previous work, that is, Enterprise ontology 
and Neon. 

In order to find a suitable methodology for our project we 
further searched for other popular ontology construction 
methodologies. To compare the search result, we used a 
framework adapted from [11] that focuses on activity 
categories in the construction process (1. Management, 2. 
Pre-Development, 3. Development, 4. Post-Development, 
and 5. Support) and added our special criteria: supporting 
multi-language (persian in specific), having technical tools, 
and having detailed guidelines and algorithems (to support at 
least unstructured and structured knowledge sources). The 
result of this comparison is presented in Table 2. 
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Table 2. Ontology development methodologies comparison 


Supported activity Multi- Tools Detailed 
Methodology category Language 
1 2 3 4 |5 support 

Enterprise Ontology [6] - - X |- X |- - No 
METHONTOLOGY [12] X|- xX |X | X | - X Partly 
TOVE [13] - - X X|- - No 
Ontology Development 101 [14] - - X |- X |- - No 
DILIGENT [15] X | - - xX | X | - - No 
UPON [16] - - xX | xX | X | - - No 
On-To-Knowledge [17] xX |X |X | X | X | - X No 
Neon [8] X | X | X | X |X | Localization x Partly 
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Figure 1. The general methodology of design science research [18] 


As shown in Table 2, there is no comprehensive 

methodology that fits all criteria, thus we designed a new 
methodology by using Design Science Research (DSR) 
general cycle method, as shown in Figure 1, and then used 
this methodology to create the final product. 
In “awareness of problem” step in DSR, we investigated 
different methods to construct ontology from three different 
types of knowledge sources: ontology learning from texts, 
ontology learning from relational databases, and creating an 
ontology by merging existing ontologies. 


Ontology learning! from texts: 

Researchers suggested several semi-automated methods for 
learning ontology from texts. These methods can be 
categorized in three approaches: linguistic, statistical, and 
mixed [19]. 

In order to learn ontology from text, we selected a mix of 
TF-IDF’ [20] and co-occurance analaysis [21] techniques 
from the statatistical approach, and Wordnet technique from 
the linguistic approach. 


' Semi-automated ontology construction is also called ontology learning. 
? Term Frequency (TF) — Inverse Document Frequency (IDF) 


Ontology learning from Relational Databases: 

Different methods and techniques were proposed by 
researchers for extracting ontology from a relational 
database. These methods can be categories in two main 
approaches: creating ontology based on database schemas, 
and domain-specific ontology [22]. The first approach relies 
on database schema and do not consider table contents and 
other meta-data such as existing vocabularies. The second 
approach considers database content and also knowledge of 
domain experts. In this work we focused on the second 
approach. 

The domain-specific approach is also categorized into 
two sub-approaches: No-Reverse engineering and reverse 
engineering [22]. In the No-Reverse engineering approach, 
an RDF graph of database content is created and mapped to 
an ontology by experts (mostly manually). This approach is 
not suitable for large databases because the graph will be too 
large to create and investigate. In this research, our 
knowledge source is a higher education ERP database that 
contains more than 2000 tables, so we focused on re- 
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engineering the database approach. 

Re-engineering methods use rules for transferring 
database entities to ontology elements. The following are the 
most used transfer rules [22]: 

1. Default rules: These rules are adapted from Berners-Lee 
rules [23]: Briefly, these rules are: transfer tables to 
classes, non-foreign key fields to data properties, foreign 
key fields to object properties, and table records to 
instances. 

2. Binary relationship rule: This rule identifies tables that 
are designed to link two tables and transfer them to object 
properties. 

3. Hierarchy class rule: If the primary key of a table is a 
foreign key to the primary key of another table, there is a 
subclass-superclass relation between their mapped 
classes. 

4. Weak entities rule: If a table has a composite primary key 
that contains a foreign key to another table, the mapped 
classes has a “part-of” relationship. 

5. N-ary relationship rule: If a primary key consists of 
foreign keys to more than one table, it should be broken 
into binary relationships. 

6. Fragmentation rule: If some tables have a same primary 
key, they should be integrated into one class. 

7. Constraint rule: These rules exploit additional schema 
constraints, which are presented in SQL DDL statements 
(such as non-nullable and unique contraints) 

8. Datatype rule: Transfer SQL datatype to value contraints 
in ontology. 

The above rules are created based on some assumptions 
such as the database is in the 3NF format, tables and fields 
have meaningful names ,and all foreign keys are defined in 
the database schema. These assumptions are not true 
specially in large databases, so researchers suggested 
applying a three step process for extracting an ontology from 
a database [24]: preparation, extraction, and enrichment. 

In the preparation step, we focus on two aspecs of database 

elements meta-data: 

1. Completeness: It means having a complete 
understanding and proper meta-data about database 
elements. All database entities should be labeled by 
meaningfull description and all relations between tables 
even those hidden in application code should be 
specified; 

2. Relevence: Relevence of all database entities to our 
domain should be specified. 

Table 3 shows the compaison of some highly cited re- 
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engineering methods based on supporting preparation 
aspecs, extraction rules, and enrichment step. To the best of 
our knowledge there is no comprehensive re-engineering 
method that is suitable for our goal and thus we planned to 
design a new method that supports all mapping rules and the 
three steps of extraction. 


Creating an ontology by merging existing ontologies: 
Existing ontologies are structured knowledge resources for 
ontology creation. Researchers have proposed several 
methods for ontology merging. These methods use at least 
one of the following approaches [39]: 

1. Structure based: In this approach ontologies are 
represented as directed labelled graphs and similarity 
comparison between a pair of classes from two 
ontologies is based on the analysis of their position 
within the graphs. One of the popular methods in this 
approach is PROMPT [40]. 

2. Terminological based: Terminological methods 
compare strings and can be applied to the name, the 
label, or comments of ontology entities. 

3. Instance based: These methods determine the 
similarity between concepts by examining the overlap 
of their instances. 

4. Background knowledge based: Only few methods 
consider the background knowledge in the mapping 
process and they are limited to use knowledge in the 
upper ontology [41], knowledge hidden in corpus [42], 
and semantic web [43]. 

Most of tools and techniques for ontology merging, were 
developed as a part of a research project and were 
customized based on their needs [44]; therfore they beome 
outdated after elapsing a period of time. For example, 
PROMPT used to be a pioneer tool in ontology merging, 
however, it has not been updated in the past 10 years and the 
current version of protégé does not support it any more. 
Moreover, there is no tool or technique in ontology merging 
that supports Persian language especially in semantic 
similarity search by using Wordnet or other methods. Due to 
these reasons we decided to design a new method that 
supports structured and terminological based approachs and 
also use background knowledge, where the results of 
ontology learning from text and extracting ontology from 
database are the background knowledge. 


Table 3. A comparison of re-engineering methods 


database entities 


Preparation Extraction rules Enrichment 
Method Completeness Relevance 17}/2/3}4]5 [6/7] 8 
Shen, Huang, Zhu, & xX {|X |X xX |x] x 
Zhao [25] 
Ghawi & Cullot [26] xX |X |X xX | xX 
Tirmizi, Sequeda, & xX {|X |X X XIX 
Miranker [27] 
Cerbah [28] manually define relevance of | X | X | X X 


Alalwan, Zedan, & xX {|X |X xX {|x {|x |x 
Siewe [29] 
Lubyte & Tessaris [30] xX |X X X 


Albarrak & Sibley [31] 


Astrova [32] 
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Preparation Extraction rules Enrichment 
Method Completeness Relevance 1,2;3 },4],5]|6}] 7] 8 
Liu, Wang, Bao, & XIX X X 
Wang [33] 
Santoso, Haw, & identiying hirarchical relation xX {|X |X xX] xX] x 
Abdul-Mehdi [34] based on table contents 
Khan & Sonia [35] X|X |X {|X |X x 
Blobel [36] xX | X X 
Kaulins & Borisov xX/xX]xX|]xX/xX]xX/]xX|xX{]x 
[37] 
Zarembo [38] xX |X |X xX | xX 
source code , database Transfei 
upload text documents and Transfer database schema to Sofii widens a ‘es a 
, atabase eni 
decompose texts into pages MaA aa 3 
| configuration, to ontology 
I k i receive compare to gold 
related database AÀ development history and entity module 
relatedtext A expert standard 
identification database query analyzer 
documents ences comments 
identification 
ntology construction process | 
domain experts(\) 7 =| 
identification 
: | database re- A 
Register expert engineering 
informations 
existing ontologies À + | 
identification knowledge A - 
soises evaluation A 
| Requirement A identification ontologies integration À > | 
ontology specification 
entities 
labeling 
- existing A 
mapping ontologies merge 
ontology terms extraction A terms A upgrade and À 
niks ig conceptualization meiiies 
merge = | = | | q Es 
ontology i F 
ii semi automated reuse of current mapping terms project A 
entities 
terms extraction ontology to ontology management 
elements entities | 
refinement of similar terms (terminoligal 
the result and semantic) eae aT ea send ai ae and 
ontology and merging identification receive locument 
electronic letters management 


Figure 2. The ontology construction integrated system 


2. Ontirandoc, an Integrated Methodology for Ontology 
Construction 

The second phase of DSR is suggestion. In this phase we 
designed a tentative model of an integrated system, called 
Ontirandoc, which can be used for ontology construction 
from three types of knowledge sources. Ontirandoc is not 
only a tool for creating and editing ontology files, but also a 
methodology that contains detailed process guideline, 
methods, algorithms, and an integrated modular software to 
support the process!. 

In the development phase of DSR, we implemented our 
algorithms in an open source PHP web application. The 
implemented system was tested by input data (text 
documents, ERP database and existing ontologies) and the 
results were checked manually to find exceptions and errors. 
The system was evolved according to the results of the 
evaluation phase. 


After passing several rounds of “suggestion — development 
— evaluation” cycle in DSR, we reached our final integrated 
system. This system has a modular design and is open source 
to enable other researchers to upgrade or customize it 
according to their especial needs. 

The structure of the system is shown in Figure 2. The 
model was designed by ArchiMate? language that is one of 
the architecture description languages in ISO/TEC/IEEE 
4210. 

The proposed structure covers main activities for 
ontology construction and also provides a platform for 
collaborative ontology development. The main activities 
were adapted from Methontology [12], On-To-Knowledge 
[17] and Neon [8] methodologies. 

Several software modules were designed and 
implemented in an integrated system to support main 
activities and Persian language. The modularity design 


' According to [45] definition, methodology is “a comprehensive, integrated series of techniques or methods creating a general systems theory of how a 


class of thought intensive work ought to be performed” 
* http://pubs.opengroup.org/architecture/archimate3-doc/ 
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allows to upgrade or customize each module independently. 
All modules are integrated based on data layer as shown in 
Figure 3. 


Ontirandoc activities and related modules: 


1. Requirement specification: Almost all ontology 
construction methodologies consider this activity for 
which the result is a document that specifies the goal, 
scope, and requirements of the product. 

2. Knowledge sources identification: environment study 


and feasibility test are two main tasks that are mentioned 

in On-To-Knowledge and Neon methodologies. 

In Ontirandoc, these tasks are decomposed into four 
activities: existing ontologies identification, domain expert 
identification, related text documents identification, and 
related database identification. 

To identify existing ontologies, we designed a 6 steps 
guideline as following. Step 1-4 are adapted from ontology 
dowsing document suggested by [46]: 

e Checking list of ontologies and services websites 
e Using semantic search engines (such as swoogle!) 
e Checking ontology repositories 
e Checking mailing lists and online forums. 
We extended ontology dowsing guideline by adding 2 steps: 
e When an ontology is found, investigate its code and if it 
uses other ontology elements, find and check the referred 
ontologies. 
e Search scientific articles that may have an ontology result. 
1. Terms extraction: In this activity, ontology developers 
extract terms based on open coding technique in content 
analysis methods [47]. The first time a term is identified 
by a developer, he can add it and its location (page, 
paragraph and sentence) into the terms vocabulary by 


15 


using Ontirandoc register terms user interface. The 
location will be used in co-occurance analysis. If 
developers identify an existing term in the text, they can 
select it from vocabulary and add its new location, so the 
system can calculate TF-IDF of each term. Some 
modules are designed and implemented in order to help 
developers to: 

e Identify previously extracted terms. 

e Suggest similar existing terms before adding a new one. 
This module will show both structural and semantic 
similarity. Semantic similarity is identified by using 
wordnet (in our case we used a Persian wordnet called 
FerdowsNet?) and structure similarity is identified by 
Levenshtein distance and prefix/suffix analysis. 

e Merge similar terms. 

2. Terms conceptualization: The goal of this activity is 
transfering terms to ontology entities. Developers may 
create a new ontology entity for a term or just map the 
term to an existing ontology elements. A software 
module calculates TF-IDF value of extracted terms and 
shows them as a sorted list to developer. A term with 
larger TF-IDF is more important in that domain. The 
following software modules help developers in this 
activity: 

e Showing a term references in texts. By selecting each term, 

this module shows all paragraphs that have this term. 

e Showing semantic related terms (in the current version just 
synonyms, hyponyms and hypernyms) for each selected 
term by using WordNet and FerdowsNet. These lists would 
help developers to identify hierarchical or non-hierarchical 
relations in the ontology. 


Activities, tasks Task and document terms 
and documents management f A A A 
a ; : : : 
: identify similar terms (terminoligal and e P 
i : semi automated term 
send and receive semantic) and merge them 
electronic letters i extraction 
: ; identify term i 

mapping terms to ae z z aiae id X 

ontology entities — 7 > operations unstructured 
: audit and 2 content 
i . | PEE merge ontology entities 
: version A A 
: refinement of the : : 
i management 
; result ontology EEP upload text 
i documents and 
i source code , database fabelng 
i reuse of current content, application a 
; ontology elements configuration, development econ 
i : history and database query 
i ontology entities a 7 Seer eee : 
: labeling fr > ontology v e EEST 
: ransfer database 
i elements database oh k i 
: Ciuc e a e ee sd e+ ed schema to meta- 
: mapping ontology a <- meta-data 7 : it 
i ata reposito 
: entities pository 
H = experts 
: POE gh SOAN > contact : 
ù% informations receive 

SOPE EE A A A E EE EA ENO IESE E LE A A AASER expert 


comments 


Fig. 1 Integration and relations between modules in data layer 


1 http://swoogle.umbc.edu 


* http://wtlab.um.ac.ir/index.php?option=com_content&view=article&id=314&Itemid=200 


16 


e Showing similar terms (structural similarity) for each 
selected term. This list helps developers to identify 
relations between classes or properties of classes. 

e Showing similar ontology elements in existing ontologies. 
It assists developers to select a better ontology element 
type by knowing other’s modeling view. 

e Performing co-occurrence analysis to identify relation 
between terms and their mapped ontology elements. 

After conceptualizing all the terms, following software 

modules would help developers to refine the result ontology: 

e Showing all classes that have similar child classes and 
asking the developer if he wants to merge them. 

e Showing redundant properties/relations (exists in both 
parent and child class) and asking the developer if he wants 
to remove them. 

e Showing similar relations between two classes and asking 
the developer if he wants to merge them. 

1. Database re-engineering: This activity is designed in two 

steps: preparation and extraction. 

Ontirandoc relies on a rich meta-data, therefore the 

preparation step is designed to prepare such data. A rich 

meta-data should have the following information about 
database elements: 

e All elements should have clear and meaningful labels that 
describe their content and existence reason. These labels 
can be defined in Persian and English languages. 

e Relatedness of each element to business domains should be 
specified. 

e All table relations should be specified (some of these 
relations are defined in the database schema and some of 
them are hidden in applications’ code). 

To support enrichment of meta-data, several modules were 

designed and implemented in Ontirandoc: 

e Table content investigator: Researchers have proposed a 
few solutions to extract the meaning of tables by analyzing 
their contents, such as [34] and [48]); however these 
solutions are not efficient for large tables like the case of 
our database. The table content investigator module in 
Ontirandoc does not apply any specific data mining or 
other data processing algorithms and only allows ontology 
developers to investigate table contents by applying 
horizontal and vertical filters. 

e Source code investigator: Most of the ambiguities in 
database entities meaning can be resolved by investigating 
application source code [24]. Some table relations might 
also be hidden in the source code. This module proposes a 
practical solution to 


complete the meta-data by 


investigating application source code. Ontology 


developers can use this module through a user interface 


' Database Management System 
? User Interface 
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that allows them to complete meta-data of a table through 
following features: 


o Showing all source files that send queries with specific 


table names to the DBMS! (it assumes that this module 
has access to query log files). Ontology developers can 
trace usage of a table in source files and identify the 
meaning of that table by reading the related source codes. 
Showing content of a source code file to the developer. 
Showing source code files evolution history (it assumes 
that this module has access to the software project 
management data). History of a source file helps 
ontology developers to find the reasons of creation and 
evolution of a source code that is related to a table. It also 
helps to discover software developers who work on that 
source file, and may need to refer to software developers 
and ask them about the usage of a table. 


o Investigating the software configuration: information 


systems usually organize their features in system menus. 
Relation between software menus and source code files 
is a good knowledge source about the meaning of tables. 
This module helps ontology developers to trace a menu 
from the source files that use specific tables. Description 
of menus can tell ontology developers about the meaning 
of tables and also ontology developers can refer to those 
menus in functional systems and extract the meaning 
from their UL’. 


o Suggesting table relations: The structural similarity 


between a field name in one table and primary key in 
another table may reveal a foreign key that is not defined 
in the database schema. 

In the extraction step of re-engineering, several algorithms 
were designed to implement 8 transferring rules that we 
discussed before. These algorithms relay on a complete 
meta-data that were prepared in the preparation step. 

Figure 4 shows the algorithm of applying default, weak 
entities, and constraint rules (rules number 1, 5, and 7). Key 
ideas of this algorithm are considering the coding tables and 
restricted values of fields. 

As presented in Figure 4, each non-key field is 
transferred to a data property, because in large databases, like 
our case, the result has too many data properties. In this case, 
the prepared meta-data is very helpful. Ontirandoc extraction 
module adds all similar data properties in a list. Two data 
properties are similar if their title or label (in Persian) are 
structurally or semantically similar. Moreover, if two data 
properties have the same permitted values list, they might 
also be similar. Ontology developers can review the list and 
select which data properties should be merge together. 

Figure 5 shows the algorithm of applying binary and N- 
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ary relationship rules (rules number 2 and 5). 

Figure 6 shows applying hierarchy and fragmentation 

rules algorithm (rules number 3 and 6). This algorithm 
identifies potential fragmented tables and hierarchy relations 
according to the meta-data and allows user to confirm or 
reject the suggestions. 
2. Merging existing ontologies: This activity has four steps: 
labeling ontology elements, mapping similar ontology 
elements, merging ontologies, and refining the result. Four 
software modules were designed in correspondence to these 
steps. 


Select all tables with at least one 
non-foreign key field and push them 


into aray A 


is there any field in 8? 


Push all T fields into array B 


Labeling step will provide localized (each element has a 
Persian label) and consistent (all same elements have same 
label) ontologies. Ontirandoc software modules and UI help 

ontology developers to navigate between ontologies and 
their elements, view structural and semantically similar 
elements, and add proper labels. 

Because of the difference in naming and modeling view, 
finding similar elements in different ontologies cannot be 
fully automated and needs user intervention [49]. Figure 7 
shows Ontirandoc suggested workflow for performing this 
step. 


is there any 
table in arrayA 


Pull a table from array A and call it T. 


Create a class with this table title and add 
table comment asa label to the class. 


Create an object property with F title. Set 


domain of this property to corresponding 
class of T and the range of it to 
corresponding class of T2 


Pull one field from 
Band call it F. 


is permitted values 
for F restricted? 
according to meta-data 


ls F a foreign key to 
another table (T2)? 


Create a data property with F name and 
call it DP. Add F comment as a label to 


is T2 a coding table? 


Create a data property with F name and 
call it DP. Add F comment as a label to 
DP. DP. 


For each valid values of F, add a new 
permitted value as a restriction for DP. 


Figure 4. Algorithm for rules number 1, 5, and 7 
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Select all tables that have only 
primary and foreign keys and push 
them into array A 


Pull atable from array A and call it T. ls there any 
Create a data property (call it P) with T : table in array A 
title and T comment as a label to P 


Add all possible binary relations between 
foreign keys to array B 


Pull one relation from B and add 
corresponding classes of two sides in 
domain and range of P 


Figure 5. Algorithm for rules number 2 and 3 


Select all tables that their primary 
key is not a foreign key to other 
tables and push them into array A 


Pull a table from array A and call it T. 
if there are some tables that their 
primary keys are foreign key to T primary 
key, push them into array B. 


is there any 
table in array A 


Add hierarchy relation between 
corresponding classes 


Pull one table from array B (call it 
T2) and show detailed information 
of T and T2 corresponding classes 
to the user 


Add an object property and set its 
What is user decision? on-hi ic domain and range to the corresponding 
classes 


Merge corresponding classes 


Figure 6. Algorithm for rules number 3 and 6 
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Create a queue named Cij for 


comparing Oi) and O(j) elements All 
O{i) elements push into the queue and 
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Pull “E” element from “un-decided” list 
and add a “No similar exists” to mapping 
suggestions list. 


set to “un-decided” status 


is all ontology 
pairs compared? 


Show queues to user 
Select a queue (Cij) 


is an “un-decided 
element exists in the 
queue? 


Apply transitive and symmetry 
rules to map all possible No 
elements in queues 


Load labeled ontologies 
in a O array. 


Ofi..n] 


Add all structurally similar elements in 
O(j) to mapping suggestions list 


4 


Add all semantically similar elements in 
O(j) to mapping suggestions list 


Add all similar elements in O(j) 
{according to their positions in ontology 
graph) to mapping suggestions list 


Show suggestions list to the user 


Select “No similar exists” 


Select oneor more similar elements 


set the “E” element 
status to “No similar 
exists” 


Register the mapping and 


set the “E” element 
status to “mapped” 


Figure 7. Mapping process workflow 


The merge step also needs the user intervention. Figure 8 
shows the Ontirandoc suggested algorithm for merging 
ontologies based on the results of previous step. Having 
enough documentation about ontology elements is a very 
important issue in application of an ontology [7]. The merge 
algorithm like other designed methods in Ontirandoc allows 
users to track each ontology element to its source. 

The last step in merging activity is refinement. Because 
of the difference in granularity, detail level, and modeling 
view of source ontologies, the product of previous step may 
have some errors. Ontirandoc methodology suggests the 
following operations in the refinement step (these operations 
can be performed by the software modules that are designed 
and implemented in Ontirandoc): 

e Identifying and investigating similar relations: If two 
classes have more than one semantic relation, these 
relations may be duplicate. These classes should be shown 
to the user in order to merge or remove redundancy. 

e Identifying duplicate properties: Classes with hierarchy 
relations should not be in domain or range of a property. 
Because of the inheritance between parent and child 
classes, these duplications should be found and fixed. 

e Suggesting hierarchy relations: Classes that their common 


properties and relations are more than a threshold, may 
have hierarchy relation. These classes should be shown to 
the user, so that he can select one of the following choices: 

o Selecting one class as parent and removing all common 
properties and relations from the child classes; 

o Creating a new class as parent of all selected classes; 
Removing all common properties and relations from 
child classes and inserting them into the new class; 

o Do nothing; 
3. Evaluation: some researchers have proposed several 
methods to evaluate an ontology. These methods can be 
classified into three approaches [50]: comparing ontology 
with a “golden standard” based on the user, based on 
application of ontology, and based on comparing with the 
source of data. In our methodology, the evaluation activity is 
designed based on two approaches: 
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Create a queue named Cij for 
comparing O{i) and O(j) elements All 
O(i) elements push into the queue and 
set to “un-decided” status 


Load labeled ontologies 
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Pull “E” element from “un-decided” list 
and add a “No similar exists” to mapping 
suggestions list. 


: 


Add all structurally similar elements in 


in aO array. 


Ofi..n] 


is all ontology 
pairs compared? 


Show queues to user 


| Select a queue (Cij) | 


Is an “un-decided” 
element exists in the 
queue? 


Apply transitive and symmetry 


rules to map all possible 
elements in queues 


O(j) to mapping suggestions list 


Add all semantically similar elements in 
O(j) to mapping suggestions list 


Add all similar elements in O(j) 
(according to their positions in ontology 


graph) to mapping suggestions list 
Yes 


Show suggestions list to the user 


hat is user 
choice? 


Select “No similar exists” 
Select one or more similar elements 


set the “E” element 
status to “No similar 
exists” 


Register the mapping and 
set the “E” element 
status to “mapped” 


Figure 8. Merge algorithm 


Table 4. Ontologies 


created by Ontirandoc 


Ontology resource | Number of classes Auber aa 
properties/relations 
Existing ontologies 135 165 
Text 83 172 
Database 156 655 


e Comparing with a golden standard: precision and recall are 
two main measures that should be calculated [51]. A 
software module was designed and implemented to 
calculate these parameters. It is worth noting that before 
comparing two ontologies, their elements must be labeled 
by using Ontirandoc tools as we discussed before. 

e Based on user: Assertions technique is one of the methods 
in this approach. This would allow users to investigate data 
model details by viewing them in a list of natural language 
assertions [52]. We adapted this technique, customized it 
to support Persian language, and implemented a web-based 
software module to show an ontology details in Persian 
language assertions and get users opinion and comments. 
The user’s feedback is aggregated and shown to developers 
for updating the ontology. 

In addition to checking validity of ontology by applying 

the above approaches, we designed and implemented a 

software module to calculate the quality of the target 


ontology based on the framework presented in [53]. 
Ontology quality measures that are implemented in these 
modules are Number of Properties (NOP), Average 
Properties per Class (AP-C), Average Fanout of Classes 
(AF-C), Number of Roots (NoR), and Average Fanout of 
Root Classes (AF-R). 


3. Constructing the Target Ontology: 

We used Ontirandoc methodology to construct our target 
ontology. In the knowledge source identification activity, the 
following sources are identified: 

1. Existing ontologies: 8 related ontology OWL files on the 
web are identified by using the upgraded ontology 
dowsing method: 

Common European Research Information Format 
(CERIF)! 

Lehigh University Benchmark (LUMB) ? 


! http://www.eurocris.org/Uploads/Web%20pages/CERIF-1.6/CERIF_1.6_2.xsd 


? http://swat.cse.lehigh.edu/onto/univ-bench.owl 
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Semantic Web for Research Communities (SWRC)! 

Toronto University” 

University Ontology? 

VIVO* 

National Current Research Information System for IRAN 

(SEMAT) [54] 

e Higher Education Reference Ontology (HERO)° 

2. Text documents: “Statistical concepts of science, 
research and technology” [55] and “Statistics of Higher 

Education in Iran (Academic Year 2015-2016)” [56] 

books both published by Higher Education Research and 

Planning institue. 

3. Database: Ferdowsi 
database. 

After performing all activities before final ontologies 

integration, we obtained three products from three different 

knowledge sources that are shown in Table 4. 

As you can see in Table 5, comparing these ontologies 
with each other shows that none of them fully covers other 
concepts and properties. The first number in each cell shows 
number of classes in the row ontology that have 
corresponding classes in the column ontology, and the 
second number shows number of properties in the row 
ontology that have corresponding properties in the column 
ontology. 

The ontology that is constructed from database has the 
most details (properties and relations). This is because of the 
nature of ERP database that should contain almost all 
operational data structure in a specific domain, but it does 
not cover about 30% of concepts and properties of the two 
other ontologies. Some of these concepts are not designed in 
the database because their corresponding business process is 
not automated, such as “Audit Board”, and others are super 
classes that are designed in more than one table, such as 
“Publication”. The goal of Ontirandoc is constructing a 
comprehensive ontology as much as possible, so the final 
activity is performed to integrate these three ontologies into 
the final ontology with 164 classes and 585 data and object 
properties (OWL file of this ontology can be downloaded 


University of Mashhad ERP 
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from GitHub‘). 


Table 5. Comparing overlap of ontologies on each other 


From existing From text From 

ontologies database 
From existing 61.45% 72.19% 
ontologies 27.91% 71.51% 
From text 41.95% 72.79% 

31.52% 65.45% 
From database 41.03% 47.44% 

20.31% 18.47% 


There exists no golden standard ontology for higher 
education in Iran, therefore we requested some colleagues to 
create a new ontology based on MSRT information gathering 
systems’. We assumed that these systems cover almost all 
data needed by MSRT, so it may be used as a benchmark to 
calculate the recall parameter of ontology evaluation. This 
benchmark ontology is created by a simple manual re- 
engineering method, that is, investigating user interface 
forms and transferring forms and their elements to ontology 
elements. 

The created benchmark ontology has 55 classes, and 155 
objects and data properties®. 

The comparison between the final ontology and this 
benchmark shows that the final ontology has a big difference 
in covering the benchmark ontology elements compared to 
existing ontologies. As shown in the second column of Table 
6, it covers almost all elements of the benchmark ontology. 
Moreover, three experts used the implemented user-based 
evaluation method that was mentioned earlier and their 
comments show that the final ontology is valid. 

Table 6 shows the quality measures of the final ontology 
compared to existing ontologies that presents its high quality. 


Table 6. Comparing the final ontology with other ontologies 


Ontology Coverage of benchmark AF-R NoR AF-C AP-C NoP NoC 
(recall measure) 
VIVO 41.43% 50765 1 132.55 | 0.66 252 383 
CERIF 36.19% 85.5 2; 0.83 3.77 781 207 
SEMAT 35.71% 16359 | 2 207.08 1.22 193 158 
HERO 25.24% 1769 2 63.18 2.52 141 56 
LUMB 11.43% 40 2 2 0.68 27 40 
SWRC 18.57% 704 1 13.28 1.06 56 53 
Toronto Ontology 13.33% 96 2 3.76 0.65 33 51 
University Ontology 16.67% 21.6 5 1.57 0.62 43 69 
Final ontology 96.19% 21744 1 132.59 | 6.35 1041 164 


! http://swrc.ontoware.org/ontology 


? http://www.cs.toronto.edu/semanticweb/maponto/MapontoExamples/univ-cs.owl 


3 http://www.webkursi.lv/luweb05fall/resources/university.owl 
4 http://vivoweb.org/files/vivo-isf-public-1.6.owl 

> http://sourceforge.net/projects/heronto/ 

ê https://github.com/milanifard/HigherEducationOntology 


7 Higher Education System (http://hes.msrt.ir), SAHMA (https://portal .irphe.ac.ir) and SEMAT (http://www.semat.ir) 


8 https://github.com/milanifard/HigherEducationOntology 
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4. Conclusion 

In this research we created a reference ontology for 
education and research domain of higher education in Iran. 
This ontology was constructed by a new methodology that 
was designed using DSR method and contains an 
architecture, detailed workflow process, and guideline for 
performing each step. In order to implement and test this 
methodology (according to DSR life cycle), we developed an 
integrated modular open source web-based system that 
supports all activities mentioned in our methodology. 

The designed system was implemented in over 40,000 

lines of code in PHP. It can be download from GitHub! and 
it is free to use, customize and add new modules to suuport 
special needs of other researchs and projects. 
A reference ontology for education and research 
organizations of Ministry of Science, Research and 
Rechnology was built using Ontirandoc methodology and its 
integrated system. This product was validated by experts and 
was also compared with MSRT information needs 
(benchmark ontology). The quality measures show the final 
product has a high quality. 
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